如今,基于CNN的架构在学习和提取功能方面的图像分类成功使它们如此受欢迎,但是当我们使用最先进的模型对嘈杂和低质量的图像进行分类时,图像分类的任务变得更加具有挑战性。为了解决这个问题,我们提出了一种新颖的图像分类体系结构,该体系结构以模糊和嘈杂的低分辨率图像学习细节。为了构建我们的新块,我们使用了RES连接和Inception模块想法的想法。使用MNIST数据集,我们进行了广泛的实验,表明引入的体系结构比其他最先进的卷积神经网络更准确,更快。由于我们的模型的特殊特征,它可以通过更少的参数获得更好的结果。
translated by 谷歌翻译
Research on automated essay scoring has become increasing important because it serves as a method for evaluating students' written-responses at scale. Scalable methods for scoring written responses are needed as students migrate to online learning environments resulting in the need to evaluate large numbers of written-response assessments. The purpose of this study is to describe and evaluate three active learning methods than can be used to minimize the number of essays that must be scored by human raters while still providing the data needed to train a modern automated essay scoring system. The three active learning methods are the uncertainty-based, the topological-based, and the hybrid method. These three methods were used to select essays included as part of the Automated Student Assessment Prize competition that were then classified using a scoring model that was training with the bidirectional encoder representations from transformer language model. All three active learning methods produced strong results, with the topological-based method producing the most efficient classification. Growth rate accuracy was also evaluated. The active learning methods produced different levels of efficiency under different sample size allocations but, overall, all three methods were highly efficient and produced classifications that were similar to one another.
translated by 谷歌翻译
Osteoarthritis (OA) is the most prevalent chronic joint disease worldwide, where knee OA takes more than 80% of commonly affected joints. Knee OA is not a curable disease yet, and it affects large columns of patients, making it costly to patients and healthcare systems. Etiology, diagnosis, and treatment of knee OA might be argued by variability in its clinical and physical manifestations. Although knee OA carries a list of well-known terminology aiming to standardize the nomenclature of the diagnosis, prognosis, treatment, and clinical outcomes of the chronic joint disease, in practice there is a wide range of terminology associated with knee OA across different data sources, including but not limited to biomedical literature, clinical notes, healthcare literacy, and health-related social media. Among these data sources, the scientific articles published in the biomedical literature usually make a principled pipeline to study disease. Rapid yet, accurate text mining on large-scale scientific literature may discover novel knowledge and terminology to better understand knee OA and to improve the quality of knee OA diagnosis, prevention, and treatment. The present works aim to utilize artificial neural network strategies to automatically extract vocabularies associated with knee OA diseases. Our finding indicates the feasibility of developing word embedding neural networks for autonomous keyword extraction and abstraction of knee OA.
translated by 谷歌翻译
Spatial understanding is a fundamental aspect of computer vision and integral for human-level reasoning about images, making it an important component for grounded language understanding. While recent large-scale text-to-image synthesis (T2I) models have shown unprecedented improvements in photorealism, it is unclear whether they have reliable spatial understanding capabilities. We investigate the ability of T2I models to generate correct spatial relationships among objects and present VISOR, an evaluation metric that captures how accurately the spatial relationship described in text is generated in the image. To benchmark existing models, we introduce a large-scale challenge dataset SR2D that contains sentences describing two objects and the spatial relationship between them. We construct and harness an automated evaluation pipeline that employs computer vision to recognize objects and their spatial relationships, and we employ it in a large-scale evaluation of T2I models. Our experiments reveal a surprising finding that, although recent state-of-the-art T2I models exhibit high image quality, they are severely limited in their ability to generate multiple objects or the specified spatial relations such as left/right/above/below. Our analyses demonstrate several biases and artifacts of T2I models such as the difficulty with generating multiple objects, a bias towards generating the first object mentioned, spatially inconsistent outputs for equivalent relationships, and a correlation between object co-occurrence and spatial understanding capabilities. We conduct a human study that shows the alignment between VISOR and human judgment about spatial understanding. We offer the SR2D dataset and the VISOR metric to the community in support of T2I spatial reasoning research.
translated by 谷歌翻译
The behavior of the network and its stability are governed by both dynamics of individual nodes as well as their topological interconnections. Attention mechanism as an integral part of neural network models was initially designed for natural language processing (NLP), and so far, has shown excellent performance in combining dynamics of individual nodes and the coupling strengths between them within a network. Despite undoubted impact of attention mechanism, it is not yet clear why some nodes of a network get higher attention weights. To come up with more explainable solutions, we tried to look at the problem from stability perspective. Based on stability theory, negative connections in a network can create feedback loops or other complex structures by allowing information to flow in the opposite direction. These structures play a critical role in the dynamics of a complex system and can contribute to abnormal synchronization, amplification, or suppression. We hypothesized that those nodes that are involved in organizing such structures can push the entire network into instability modes and therefore need higher attention during analysis. To test this hypothesis, attention mechanism along with spectral and topological stability analyses was performed on a real-world numerical problem, i.e., a linear Multi Input Multi Output state-space model of a piezoelectric tube actuator. The findings of our study suggest that the attention should be directed toward the collective behaviour of imbalanced structures and polarity-driven structural instabilities within the network. The results demonstrated that the nodes receiving more attention cause more instability in the system. Our study provides a proof of concept to understand why perturbing some nodes of a network may cause dramatic changes in the network dynamics.
translated by 谷歌翻译
Vision transformers (ViTs) are quickly becoming the de-facto architecture for computer vision, yet we understand very little about why they work and what they learn. While existing studies visually analyze the mechanisms of convolutional neural networks, an analogous exploration of ViTs remains challenging. In this paper, we first address the obstacles to performing visualizations on ViTs. Assisted by these solutions, we observe that neurons in ViTs trained with language model supervision (e.g., CLIP) are activated by semantic concepts rather than visual features. We also explore the underlying differences between ViTs and CNNs, and we find that transformers detect image background features, just like their convolutional counterparts, but their predictions depend far less on high-frequency information. On the other hand, both architecture types behave similarly in the way features progress from abstract patterns in early layers to concrete objects in late layers. In addition, we show that ViTs maintain spatial information in all layers except the final layer. In contrast to previous works, we show that the last layer most likely discards the spatial information and behaves as a learned global pooling operation. Finally, we conduct large-scale visualizations on a wide range of ViT variants, including DeiT, CoaT, ConViT, PiT, Swin, and Twin, to validate the effectiveness of our method.
translated by 谷歌翻译
GTFLAT, as a game theory-based add-on, addresses an important research question: How can a federated learning algorithm achieve better performance and training efficiency by setting more effective adaptive weights for averaging in the model aggregation phase? The main objectives for the ideal method of answering the question are: (1) empowering federated learning algorithms to reach better performance in fewer communication rounds, notably in the face of heterogeneous scenarios, and last but not least, (2) being easy to use alongside the state-of-the-art federated learning algorithms as a new module. To this end, GTFLAT models the averaging task as a strategic game among active users. Then it proposes a systematic solution based on the population game and evolutionary dynamics to find the equilibrium. In contrast with existing approaches that impose the weights on the participants, GTFLAT concludes a self-enforcement agreement among clients in a way that none of them is motivated to deviate from it individually. The results reveal that, on average, using GTFLAT increases the top-1 test accuracy by 1.38%, while it needs 21.06% fewer communication rounds to reach the accuracy.
translated by 谷歌翻译
Machine learning algorithms have revolutionized different fields, including natural language processing, computer vision, signal processing, and medical data processing. Despite the excellent capabilities of machine learning algorithms in various tasks and areas, the performance of these models mainly deteriorates when there is a shift in the test and training data distributions. This gap occurs due to the violation of the fundamental assumption that the training and test data are independent and identically distributed (i.i.d). In real-world scenarios where collecting data from all possible domains for training is costly and even impossible, the i.i.d assumption can hardly be satisfied. The problem is even more severe in the case of medical images and signals because it requires either expensive equipment or a meticulous experimentation setup to collect data, even for a single domain. Additionally, the decrease in performance may have severe consequences in the analysis of medical records. As a result of such problems, the ability to generalize and adapt under distribution shifts (domain generalization (DG) and domain adaptation (DA)) is essential for the analysis of medical data. This paper provides the first systematic review of DG and DA on functional brain signals to fill the gap of the absence of a comprehensive study in this era. We provide detailed explanations and categorizations of datasets, approaches, and architectures used in DG and DA on functional brain images. We further address the attention-worthy future tracks in this field.
translated by 谷歌翻译
Deep learning models require an enormous amount of data for training. However, recently there is a shift in machine learning from model-centric to data-centric approaches. In data-centric approaches, the focus is to refine and improve the quality of the data to improve the learning performance of the models rather than redesigning model architectures. In this paper, we propose CLIP i.e., Curriculum Learning with Iterative data Pruning. CLIP combines two data-centric approaches i.e., curriculum learning and dataset pruning to improve the model learning accuracy and convergence speed. The proposed scheme applies loss-aware dataset pruning to iteratively remove the least significant samples and progressively reduces the size of the effective dataset in the curriculum learning training. Extensive experiments performed on crowd density estimation models validate the notion behind combining the two approaches by reducing the convergence time and improving generalization. To our knowledge, the idea of data pruning as an embedded process in curriculum learning is novel.
translated by 谷歌翻译
Density estimation is one of the most widely used methods for crowd counting in which a deep learning model learns from head-annotated crowd images to estimate crowd density in unseen images. Typically, the learning performance of the model is highly impacted by the accuracy of the annotations and inaccurate annotations may lead to localization and counting errors during prediction. A significant amount of works exist on crowd counting using perfectly labelled datasets but none of these explore the impact of annotation errors on the model accuracy. In this paper, we investigate the impact of imperfect labels (both noisy and missing labels) on crowd counting accuracy. We propose a system that automatically generates imperfect labels using a deep learning model (called annotator) which are then used to train a new crowd counting model (target model). Our analysis on two crowd counting models and two benchmark datasets shows that the proposed scheme achieves accuracy closer to that of the model trained with perfect labels showing the robustness of crowd models to annotation errors.
translated by 谷歌翻译